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1 Introduction 

Multi-relational data, which refers to graphs whose nodes represent entities and edges correspond to 
relations that link these entities, plays a pivotal role in many areas such as recommender systems, the 
Semantic Web, or computational biology. Relations are modeled as triplets of the form (subject, relation, 
object), where a relation either models the relationship between two entities or between an entity and 
an attribute value; relations are thus of several types. In spite of their appealing ability for representing 
complex data, multi-relational graphs remain complicated to manipulate for several reasons (noise, het- 
erogeneity, large-scale dimensions, etc.), and conveniently represent, summarize or de-noise this kind of 
data is now a central challenge in statistical relational learning 

In this work, we propose a new model to learn multi-relational semantics, that is, to encode multi- 
relational graphs into representations that capture the inherent complexity in the data, while seamlessly 
defining similarities among entities and relations and providing predictive power. Our work is based 
on an original energy function, which is trained to assign low energies to plausible triplets of a multi- 
relational graph. This energy function, termed semantic matching energy, relies on a compact distributed 
representation: all elements (entity and relation type) are represented into the same relatively low (e.g. 
50) dimensional embedding vector space. The embeddings are learnt by a neural network whose partic- 
ular architecture and training process force them to capture the structure implicit in the training data and 
generalize the graph formed from training triplets. Unlike in previous work E [6] [5] El, in this model, 
relation types are modeled similarly as entities. In this way, entities can also play the role of relation 
type, as in natural language for instance, and this requires less parameters when the number of relation 
types grows. We show empirically that this model achieves competitive results on benchmark tasks of 
link prediction, i.e., generalizing outside of the set of given valid triplets. 

2 Semantic Matching Energy Function 

This work considers multi-relational databases as graph models. To each individual node of the graph 
corresponds an element of the database, which we term an entity, and each link defines a relation between 
entities. Relations are directed and there are typically several different kinds of relations. Let C denote 
the dictionary which includes all entities and relation types, and let 1Z C C be the subset of entities which 
are relation types. A relation is denoted by a triplet (Ihs, rel, rhs), where Ihs is the left entity, rhs the 
right one and rel the type of relation between them. 



2.1 Main ideas 

The main ideas behind our semantic matching energy function are the following. 



• Named symbolic entities (entities and relation types) are associated with a <i-dimensional vector 
space, termed the "embedding space". The i th entity is assigned a vector E, L G R. d . Note that more 
general mappings from an entity to its embedding are possible. 

• The semantic matching energy value associated with a particular triplet (Ihs, rel, rhs) is computed 
by a parametrized function £ that starts by mapping all symbols to their embeddings and then 
combines them in a structured fashion. Our model is termed "semantic matching" because £ relies 
on a matching criterion computed between both sides of the triplet. 

• The energy function £ is optimized to be lower for training examples than for other possible 
configurations of symbols. 

2.2 Neural network parametrization 

The energy function £ (denoted SME) is encoded using a neural network, whose architecture first pro- 
cesses each entity in parallel, like in Siamese networks |[Q . The intuition is that the relation type should 
first be used to extract relevant components from each argument's embedding, and put them in a space 
where they can then be compared. 

(1) Each symbol of the input triplet (Ihs, rel, rhs) is mapped to its embedding Eih s , E re \, E r h s G M d . 

(2) The embeddings E^g and E re \ respectively associated with the Ihs and rel arguments are used to 
construct a new relation-dependent embedding Ei hs ( re ^ for the Ihs in the context of the relation 
type represented by E rel , and similarly for the rhs: E lhs ^ = gi e f t (Ei hs , E rel ) and E rhs{rel) = 
9right{Erhs, E re i), where gi e ft and g r i g ht are parametrized functions whose parameters are tuned 
during training. The dimension of E lhs ( rel ^ and £yw re n, which we denote p, is low-dimensional 
but not necessarily equal to d, the dimension of the entity embedding space. 

(3) The energy is computed by "matching" the transformed embeddings of the left-hand and right- 
hand sides: £((lhs, rel, rhs)) = h(Ei hs ^ re ^, E rhs ^ re ^), h is a dot product in our experiments. 

We studied two options for the g functions, which lead to two versions of SME: 

• Linear form (denoted SME(linear)), in this case g functions are simply linear layers: 

Ei h8 ( r et) = gieft(Ei hs ,E re i) = W n Ej hs + Wi 2 E T rel + bj. 

E r hs(rel) = g r ight{E r hs,E re i) = W r iEj hs + W r 2Ej el + bj. . 

with Wn, W t2 , W r i, W r2 € W xd , h, b r G W and E^ denotes the transpose of E. This leads to 
the energy: £{{lhs,rel,rhs)) = ~ (W n Ej hs + W l2 Ej el + bJ) T (W rl Ej hs + W r2 E' rel + bj). 

• Bilinear form (denoted SME(bilinear)), g functions are using 3-modes tensors as core weights: 

Ei hs (rei) = gieft(Ei hs ,E rel ) = (W t x 3 Ej el ) Ej hs + bj. 

Erhs(rel) = 9right{Erh Sl E rel ) ~ (W r X 3 Ej el ) Ej hs + bj. . 

with W\, W r G ]$P xdxd (weights) and bi, b r G W (biases). x 3 denotes the n-mode vector- tensor 
product along the 3 rd mode. This leads to the following form for the energy: £ ((Ihs, rel, rhs)) = 
- m^El el )Ej hs + bjy (( Wr x 3 Ej el )Ej hs +bT). 



Table 1: Statistics of datasets used in this paper. 



Dataset 


Nb. of relation 
types 


Nb. of 
entities 


Nb. of observed 
relations 


% valid relations 
in obs. ones 


UMLS 


49 


135 


893,025 


0.76 


Kinships 


26 


104 


281,216 


3.84 


Nations 


56 


14 


11,191 


22.9 



To train the parameters of the energy function £ we loop over all of the training data resources and 
use stochastic gradient descent with a ranking objective inspired by Q. 

3 Empirical Evaluation 

To evaluate against existing methods, we performed link prediction experiments on benchmarks from the 
literature, whose statistics are in Table Q] 

The link prediction task consists in predicting whether two entities should be connected by a given 
relation type. This is useful for completing missing values of a graph, forecasting the behavior of a 
network, etc. but also to assess the quality of a representation. We evaluate our model on UMLS, 
Nations and Kinships, following the setting introduced in (4j. The standard evaluation metric is area 
under the precision-recall curve (AUC). Table [2]presents results of SME along with those of RESCAL, 
MRC, IRM, CP (CANDECOMP-PARAFAC) and LFM, which have been extracted from (5j|3]. 

The lineal - formulation of SME is outperformed by SME(bilinear) on all three tasks. The largest 
differences for Nations and Kinships indicate that, for these problems, a joint interaction between both 
Ihs, rel and rhs is crucial to represent the data well: relations cannot be simply decomposed as a 
sum of bigrams. This is particularly true for the complex kinship systems of the Alyawarra. On the 
contrary, interactions within the UMLS network can be represented by simply considering the various 
(entity,entity) and (entity,relation type) bigrams. Compared to other methods, SME (bilinear) performs 
similarly to LFM on UMLS but is slightly outperfomed on Nations. On Kinships, it is outperformed by 
CP, RESCAL and LFM: on this dataset with complex ternary interactions, either the training process 
of the tensor factorization methods, based on reconstruction, or the combination of bigram and trigram 
interactions seems to be beneficial compared to our predictive approach. Compared to MRC, which is 
not using a matrix-based encoding, SME(bilinear) is highly competitive. 

Table 2: Comparisons of area under the precision-recall curve (AUC) for link prediction. 



Method 


UMLS 


Nations 


Kinships 


SME(linear) 


0.983 ± 0.004 


0.777 ± 0.025 


0.149 ±0.003 


SME(bilinear) 


0.985 ± 0.003 


0.865 ±0.015 


0.894 ±0.011 


LFM 


0.990 ± 0.003 


0.909 ± 0.009 


0.946 ± 0.005 


RESCAL 


0.98 


0.84 


0.95 


CP 


0.95 


0.83 


0.94 


MRC 


0.98 


0.75 


0.85 


IRM 


0.70 


0.75 


0.66 



Even if experimental results on these benchmarks are mixed, it is worth noting that, contrary to all 
previous methods, SME models relation types as vectors, lying in the same space as entities. From a 
conceptual viewpoint, this is powerful, since it models any relation types as a standard entity (and vice- 
versa). Hence, SME is the only method that could be directly applied on data for which any entity can 
also create relationships between other entities. 
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